NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Case for External Graph Sketching

https://doi.org/10.1137/1.9781611978759.9

Bender, Michael A; Farach-Colton, Martín; Jacob, Riko; Komlós, Hanna; Tench, David; West, Evan T (January 2025, Society for Industrial and Applied Mathematics)

Full Text Available
Exploring the Landscape of Distributed Graph Sketching

Tench, David; West, Evan; Zhang, Kenny; Bender, Michael A; Delayo, Daniel; Farach-Colton, Martin; Gill, Gilvir; Seip, Tyler; Zhang, Victor (January 2025, SIAM)

Full Text Available
Exploring the Landscape of Distributed Graph Sketching

https://doi.org/10.1137/1.9781611978339.11

Tench, David; West, Evan T; Zhang, Kenny; Bender, Michael A; DeLayo, Daniel; Farach-Colton, Martín; Gill, Gilvir; Seip, Tyler; Zhang, Victor (January 2025, Society for Industrial and Applied Mathematics)

Full Text Available
Adaptive Quotient Filters

https://doi.org/10.1145/3677128

Wen, Richard; McCoy, Hunter; Tench, David; Tagliavini, Guido; Bender, Michael A; Conway, Alex; Farach-Colton, Martin; Johnson, Rob; Pandey, Prashant (October 2024, Proceedings of the ACM on Management of Data)

Filters trade off accuracy for space and occasionally return false positive matches with a bounded error. Numerous systems use filters in fast memory to avoid performing expensive I/Os to slow storage. A fundamental limitation in traditional filters is that they do not change their representation upon seeing a false positive match. Therefore, the maximum false positive rate is only guaranteed for a single query, not for an arbitrary set of queries. We can improve the filter's performance on a stream of queries, especially on a skewed distribution, if we can adapt after encountering false positives. Adaptive filters, such as telescoping quotient filters and adaptive cuckoo filters, update their representation upon detecting a false positive to avoid repeating the same error in the future. Adaptive filters require an auxiliary structure, typically much larger than the main filter and often residing on slow storage, to facilitate adaptation. However, existing adaptive filters are not practical and have not been adopted in real-world systems for two main reasons. First, they offer weak adaptivity guarantees, meaning that fixing a new false positive can cause a previously fixed false positive to come back. Secondly, the sub-optimal design of the auxiliary structure results in adaptivity overheads so substantial that they can actually diminish overall system performance compared to a traditional filter. In this paper, we design and implement the \sysname, the first practical adaptive filter with minimal adaptivity overhead and strong adaptivity guarantees, which means that the performance and false-positive guarantees continue to hold even for adversarial workloads. The \sysname is based on the state-of-the-art quotient filter design and preserves all the critical features of the quotient filter such as cache efficiency and mergeability. Furthermore, we employ a new auxiliary structure design which results in considerably low adaptivity overhead and makes the \sysname practical in real systems. We evaluate the \sysname by using it to filter queries to an on-disk B-tree database and find no negative impact on insert or query performance compared to traditional filters. Against adversarial workloads, the \sysname preserves system performance, whereas traditional filters incur 2× slowdown from adversaries representing as low as 1% of the workload. Finally, we show that on skewed query workloads, the \sysname can reduce the false-positive rate 100× using negligible (1/1000th of a bit per item) space overhead.
more » « less
Full Text Available
GraphZeppelin : How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)

https://doi.org/10.1145/3643846

Tench, David; West, Evan; Zhang, Victor; Bender, Michael A; Chowdhury, Abiyaz; Delayo, Daniel; Dellas, J Ahmed; Farach-Colton, Martín; Seip, Tyler; Zhang, Kenny (September 2024, ACM Transactions on Database Systems)

Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components problem on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is an inherent limitation of this approach and is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we callGraphZeppelin, uses new linear sketching data structures (CubeSketch) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph.GraphZeppelinis optimized for massive dense graphs:GraphZeppelincan process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a resultGraphZeppelinvastly increases the scale of graphs that can be processed.
more » « less
Full Text Available
GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

https://doi.org/10.1145/3514221.3526146

Tench, David; West, Evan; Zhang, Victor; Bender, Michael A.; Chowdhury, Abiyaz; Dellas, J. Ahmed; Farach-Colton, Martin; Seip, Tyler; Zhang, Kenny (June 2022, Proc. International Conference on Management of Data (SIGMOD))

Full Text Available
GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

Tench, David; West, Evan; Zhang, Victor; Bender, Michael; Chowdhury, Abiyaz; Dellas, Ahmed; Farach-Colton, Martin; Seip, Tyler; Zhang, Kenny (January 2022, SIGMOD record)

Finding the connected components of a graph is a fundamental prob- lem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge inser- tions and deletions. A natural approach to computing the connected components on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data structures (CubeSketch) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph. GraphZeppelin is optimized for massive dense graphs: GraphZeppelin can process millions of edge updates (both inser- tions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result GraphZeppelin vastly increases the scale of graphs that can be processed.
more » « less
Full Text Available
Maximum Coverage in the Data Stream Model: Parameterized and Generalized

McGregor, Andrew; Tench, David; Vu, Hoa (January 2021, ICDT 2021)

Full Text Available
PredictRoute: A Network Path Prediction Toolkit

Singh, Rachee; Tench, David; Gill, Phillipa; McGregor, Andrew (January 2021, SIGMETRICS 2021)

Full Text Available
Mitigating False Positives in Filters: to Adapt or to Cache?

Bender, Michael; Das, Ratish; Farach-Colton, Martin; Mo, Tianchi; Tench, David; Wang, Yung Ping (January 2021, SIAM Symposium on Algorithmic Principles of Computer System)

A filter is adaptive if it achieves a false positive rate of " on each query independently of the answers to previous queries. Many popular filters such as Bloom filters are not adaptive—an adversary could repeat a false-positive query many times to drive the false-positive rate to 1. Bender et al. [4] formalized the definition of adaptivity and gave a provably adaptive filter, the broom filter. Mitzenmacher et al. [20] gave a filter that achieves a lower empirical false- positive rate by exploiting repetitions. We prove that an adaptive filter has a lower false- positive rate when the adversary is stochastic. Specifically, we analyze the broom filter against queries drawn from a Zipfian distribution. We validate our analysis empirically by showing that the broom filter achieves a low false-positive rate on both network traces and synthetic datasets, even when compared to a regular filter augmented with a cache for storing frequently queried items.
more » « less
Full Text Available

« Prev Next »

Search for: All records